AITopics | high-dimensional regression

Collaborating Authors

high-dimensional regression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Randomized tests for high-dimensional regression: A more efficient and powerful solution

Neural Information Processing SystemsDec-23-2025, 22:17:26 GMT

We investigate the problem of testing the global null in the high-dimensional regression models when the feature dimension $p$ grows proportionally to the number of observations $n$. Despite a number of prior work studying this problem, whether there exists a test that is model-agnostic, efficient to compute and enjoys a high power, still remains unsettled. In this paper, we answer this question in the affirmative by leveraging the random projection techniques, and propose a testing procedure that blends the classical $F$-test with a random projection step. When combined with a systematic choice of the projection dimension, the proposed procedure is proved to be minimax optimal and, meanwhile, reduces the computation and data storage requirements. We illustrate our results in various scenarios when the underlying feature matrix exhibits an intrinsic lower dimensional structure (such as approximate low-rank or has exponential/polynomial eigen-decay), and it turns out that the proposed test achieves sharp adaptive rates. Our theoretical findings are further validated by comparisons to other state-of-the-art tests on synthetic data.

high-dimensional regression, name change, randomized test, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.60)

Add feedback

Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

Neural Information Processing SystemsSep-30-2025, 12:28:42 GMT

In the high-dimensional regression model a response variable is linearly related to $p$ covariates, but the sample size $n$ is smaller than $p$. We assume that only a small subset of covariates is `active' (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ($\ell_1$-regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called `irrepresentability' condition. In this paper we study the `Gauss-Lasso' selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate `generalized irrepresentability condition' (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set.

generalized irrepresentability condition, high-dimensional regression, model selection, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

A Residual Bootstrap for High-Dimensional Regression with Near Low-Rank Designs

Neural Information Processing SystemsSep-30-2025, 10:18:01 GMT

We study the residual bootstrap (RB) method in the context of high-dimensional linear regression.

high-dimensional regression, name change, residual bootstrap, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.39)

Add feedback

Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality

Neural Information Processing SystemsMay-27-2025, 00:58:20 GMT

Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize residual information while maximizing the relevant bits, which are predictive of the unknown generative models. We solve this optimization to obtain the information content of optimal algorithms for a linear regression problem and compare it to that of randomized ridge regression. Our results demonstrate the fundamental trade-off between residual and relevant information and characterize the relative information efficiency of randomized regression with respect to optimal algorithms.

artificial intelligence, machine learning, regression, (6 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Residual Bootstrap for High-Dimensional Regression with Near Low-Rank Designs

Miles Lopes

Neural Information Processing SystemsFeb-9-2025, 08:10:53 GMT

We study the residual bootstrap (RB) method in the context of high-dimensional linear regression.

approximation, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

Interpretation of High-Dimensional Regression Coefficients by Comparison with Linearized Compressing Features

Schaeffer, Joachim, Rhyu, Jinwook, Droop, Robin, Findeisen, Rolf, Braatz, Richard

arXiv.org Machine LearningNov-18-2024

Linear regression is often deemed inherently interpretable; however, challenges arise for high-dimensional data. We focus on further understanding how linear regression approximates nonlinear responses from high-dimensional functional data, motivated by predicting cycle life for lithium-ion batteries. We develop a linearization method to derive feature coefficients, which we compare with the closest regression coefficients of the path of regression solutions. We showcase the methods on battery data case studies where a single nonlinear compressing feature, $g\colon \mathbb{R}^p \to \mathbb{R}$, is used to construct a synthetic response, $\mathbf{y} \in \mathbb{R}$. This unifying view of linear regression and compressing features for high-dimensional functional data helps to understand (1) how regression coefficients are shaped in the highly regularized domain and how they relate to linearized feature coefficients and (2) how the shape of regression coefficients changes as a function of regularization to approximate nonlinear responses by exploiting local structures.

coefficient, feature coefficient, regression coefficient, (14 more...)

arXiv.org Machine Learning

2411.1206

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.05)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
(4 more...)

Genre: Research Report (0.50)

Industry:

Energy > Energy Storage (0.50)
Electrical Industrial Apparatus (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Robust Variable Selection for High-dimensional Regression with Missing Data and Measurement Errors

Zhang, Zhenhao

arXiv.org Machine LearningOct-23-2024

The linear relationship between response variables and covariates has been the topic of interest.In the classical squared loss function,it is usually assumed that the data obey a normal distribution.However,the data discussed in this paper contain a large number of missing data and measurement errors,such that the datausually do not conform to any of the common forms of data distribution.We propose a method based on an exponential squared loss function with tuning parameter.For data with different distributions,a better result of linear regression can be achieved by changing the value of the tuning parameter h.Therefore,forany kind of data distribution,going with an exponential squared loss function with moderating variables will be highly robust.For any data distribution,the loss function is strongly robust for h (0,+x).In previous studies,when using the traditional squared loss function,the data distribution requirements are very high,resulting in the traditional exponential squared loss function being very sensitive to anomalies.This reduces the estimation efficiency of the model,and this drawback becomes more obvious in data containing missing data with measurement errors.In contrast,the use of exponential squared loss functions can improve the estimation efficiency of the model by varying thetuning parameter h in a way that adapts to more distributed forms of data sets and produces more reliable estimates. In the traditional squared loss function,the values of the covariates are always defaulted to be free ofmissingdata and measurement errors.Even if missing data and measurement errors exist,they are assumed to be absent or these data are removed.However,this assumption is often broken in studies in disciplines such as health and epidemiology.As an illustration,Zhang and Zhou(1)looked at a collection of breast cancer patients to identify the gene expression that was associated with long-term disease-free survival.The datacollection consists of 24481 gene probes collected from 78 breast cancer patients.In particular,using the log-value of the ratio (log1o(Ratio)),which could be denoted as Y,it is possible to forecast the disease-free survival.In truth,gene sensors will inevitably lead to measurement errors.In this breast cancer data set,the(log1o(Ratio))numbers have missing data. When there are a large numberof missing data and measurement errors in a dataset,if we ignore the missing data and measurement errors and use the traditional square loss function for estimation,the estimation accuracy of the model will be greatly affected due to the chaotic data distribution,resulting in significant estimation bias.In the above dataset, We discover that employing the traditional squared loss function,which handles data with measurement errors and Robust Variable Selection for High-dimensional Regression with Missing Data and Measurement Errors

artificial intelligence, data quality, machine learning, (14 more...)

arXiv.org Machine Learning

2410.16722

Country: Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)

Add feedback

Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality

Neural Information Processing SystemsOct-10-2024, 19:06:06 GMT

high-dimensional regression, information bottleneck theory, regression, (4 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Randomized tests for high-dimensional regression: A more efficient and powerful solution

Neural Information Processing SystemsOct-9-2024, 21:54:08 GMT

We investigate the problem of testing the global null in the high-dimensional regression models when the feature dimension p grows proportionally to the number of observations n . Despite a number of prior work studying this problem, whether there exists a test that is model-agnostic, efficient to compute and enjoys a high power, still remains unsettled. In this paper, we answer this question in the affirmative by leveraging the random projection techniques, and propose a testing procedure that blends the classical F -test with a random projection step. When combined with a systematic choice of the projection dimension, the proposed procedure is proved to be minimax optimal and, meanwhile, reduces the computation and data storage requirements. We illustrate our results in various scenarios when the underlying feature matrix exhibits an intrinsic lower dimensional structure (such as approximate low-rank or has exponential/polynomial eigen-decay), and it turns out that the proposed test achieves sharp adaptive rates.

high-dimensional regression, powerful solution, randomized test, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.64)

Add feedback

Debiased high-dimensional regression calibration for errors-in-variables log-contrast models

Zhao, Huali, Wang, Tianying

arXiv.org Machine LearningSep-11-2024

Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models that involve compositional covariates. This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data. We introduce a calibration approach tailored for the linear log-contrast model. Under relatively lenient conditions regarding the sparsity level of the parameter, we have established the asymptotic normality of the estimator for inference. Numerical experiments and an application in microbiome study have demonstrated the efficacy of our high-dimensional calibration strategy in minimizing bias and achieving the expected coverage rates for confidence intervals. Moreover, the potential application of our proposed methodology extends well beyond compositional data, suggesting its adaptability for a wide range of research contexts.

artificial intelligence, machine learning, measurement error, (17 more...)

arXiv.org Machine Learning

2409.07568

Country: North America > United States > Colorado (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback